[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763
[vslib] Fix VS SAI reporting 0xFFFFFFFF oper speed for virtio NICs#1763rustiqly wants to merge 1 commit intosonic-net:masterfrom
Conversation
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Companion PR: sonic-net/sonic-buildimage#25428 (enables |
|
@rustiqly , can you also add unit test for this PR? |
70620e1 to
1580a71
Compare
|
Added 3 unit tests in
Also building a VS image with both fixes to verify end-to-end on KVM. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
1580a71 to
400fff5
Compare
|
/azp run |
|
@lguohan Fixed — the CI failure was |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Pull request overview
Fixes incorrect reporting of SAI_PORT_ATTR_OPER_SPEED in VS when sysfs returns “unknown” speed (e.g., -1 on virtio), avoiding the 0xFFFFFFFF wraparound and using configured port speed as a fallback.
Changes:
- Update
vs_get_oper_speed()to read sysfs speed as signed and reject invalid values (<= 0) with a warning. - Update
refresh_port_oper_speed()to fall back toSAI_PORT_ATTR_SPEEDwhen operational speed can’t be read. - Add unit tests intended to cover configured-speed and fallback behavior; update spellcheck word list.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| vslib/SwitchStateBaseHostif.cpp | Reads sysfs speed as int32_t and rejects invalid/unknown values before assigning to uint32_t. |
| vslib/SwitchStateBase.cpp | Falls back to configured port speed when operational speed read fails. |
| unittest/vslib/TestSwitchBCM56850.cpp | Adds tests for oper-speed refresh scenarios (but currently calls a protected method directly). |
| tests/aspell.en.pws | Adds new words used by comments/logs/tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
400fff5 to
9ce11d3
Compare
|
@lguohan Found it — the failing test was |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Commenter does not have sufficient privileges for PR 1763 in repo sonic-net/sonic-sairedis |
|
@lguohan The CI failure is All our tests passed: Triggered a rerun. |
9ce11d3 to
1b69b8e
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
CI failures are redis socket errors in vstest harness ("Unable to connect to redis (unix-socket)") — known infra flakiness unrelated to this change. All p4rt tests passed. Re-triggering. /azp run |
699f972 to
d189ce6
Compare
|
/azp run |
|
@lguohan Rebased to latest master (d189ce6). The
Could you take another look? Thanks! |
|
Azure Pipelines successfully started running 1 pipeline(s). |
VS Image Build + Test ResultsBuilt a VS image with this PR commit (d189ce6) on latest Build: Bug trigger confirmed: sysfs reports sai.profile active (companion PR #25428 merged):
No error logs: 0 matches for 3 ports oper=up, all showing correct 40G configured speed. |
d189ce6 to
094830d
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
When running on KVM/virtio, /sys/class/net/ethN/speed returns -1 (unknown). vs_get_oper_speed() reads this into uint32_t, which wraps to 4294967295 (0xFFFFFFFF) and gets reported as the oper speed. Fix: - Read sysfs speed as int32_t and check for <= 0 (invalid) - When invalid, log a warning, set speed=0, and return false - In refresh_port_oper_speed(), set attr.value.u32=0 (unknown) instead of returning SAI_STATUS_FAILURE - When m_useConfiguredSpeedAsOperSpeed is true, configured speed is used as oper speed (bypassing sysfs entirely) This ensures VS ports show 0 (unknown) rather than garbage values on virtual NICs. With the companion sai.profile change (#25428), VS ports will show the correct configured speed. Added unit tests: - test_refresh_port_oper_speed_configured_speed: verifies oper speed equals configured speed when m_useConfiguredSpeedAsOperSpeed=true - test_refresh_port_oper_speed_down_port: verifies oper speed is 0 for operationally down ports - test_refresh_port_oper_speed_fallback_no_tap: verifies fallback when vs_get_oper_speed fails (no TAP device) Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
094830d to
7c239ff
Compare
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
Fix for BuildAsan FlexCounter.bulkChunksize failurePushed an incremental commit ( Root CauseUnder Asan, the test thread runs 2-5x slower while the FlexCounter worker thread polls at its normal 1000ms interval. This creates a race:
Result: FixAdded an
This eliminates the timing dependency entirely — no sleeps, no increased timeouts, just explicit synchronization. |
|
/azp run |
1e3b4ea to
7c239ff
Compare
|
Azure Pipelines will not run the associated pipelines, because the pull request was updated after the run command was issued. Review the pull request again and issue a new run command. |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
…oll-wait ## Description Fix flaky `FlexCounter.bulkChunksize` unit test that intermittently fails in CI due to timing-dependent `usleep(1000*1050)`. **Issue:** sonic-net#1765 ## Root Cause The test uses `usleep(1000*1050)` (1.05s) to wait for the FlexCounter polling thread (1s poll interval) to complete counter collection. Under CI load, the polling thread may not finish within this window, causing assertion failures like: ``` TestFlexCounter.cpp:1390: Failure Expected equality of these values: object_count Which is: 6 unifiedBulkChunkSize Which is: 3 ``` This failure reproduces on completely unrelated PRs ([sonic-net#1763](sonic-net#1763), [sonic-net#1764](sonic-net#1764)), confirming it is timing-dependent and not caused by code changes. ## Fix Replace all `usleep(1000*1050)` calls with deterministic poll-wait helpers: - **`waitForCounterKeys(table, expectedCount)`** — polls until the expected number of counter keys appear in COUNTERS_DB - **`waitForCounterValue(table, key, field)`** — polls until a specific counter field has a non-empty value Both helpers poll every 100ms with a 5-second timeout. This eliminates timing sensitivity while keeping tests fast on unloaded machines (typically completes in 1-2 polls). Also replaces the similar `usleep(1000*1000)` and `usleep(1000*2000)` in `queryCounterCapability` / `addRemoveCounterForPort` which have the same timing vulnerability. ## Changes - `unittest/syncd/TestFlexCounter.cpp`: - Add `#include <chrono>` for `steady_clock` - Add `waitForCounterKeys()` helper - Add `waitForCounterValue()` helper - Replace 8x `usleep(1000*1050)`, 1x `usleep(1000*1000)`, 1x `usleep(1000*2000)` across `testAddRemoveCounter`, `FlexCounter.counterIdChange`, `FlexCounter.addRemoveCounterForPort`, and `testDashMeterAddRemoveCounter` - Retain `usleep(60*1000)` in `updateSwitchDebugCounterIdList` (50ms poll interval, not flaky) Signed-off-by: Sonic Build Admin <sonicbld@microsoft.com>
What I did
[agent]
When running SONiC VS on KVM/virtio,
/sys/class/net/ethN/speedreturns-1(unknown speed).vs_get_oper_speed()reads this into auint32_t, which wraps to4294967295(0xFFFFFFFF) and gets reported asSAI_PORT_ATTR_OPER_SPEED. This causesshow interfaces statusto display4294967.3Gas the port speed for operationally up ports.How I did it
vs_get_oper_speed(): Read sysfs speed asint32_tinstead of directly intouint32_t. Check for<= 0(invalid) and returnfalsewith a warning log.refresh_port_oper_speed(): Whenvs_get_oper_speed()fails, fall back toSAI_PORT_ATTR_SPEED(configured speed from CONFIG_DB) instead of returningSAI_STATUS_FAILURE.How to verify it
show interfaces statusshows4294967.3Gfor Ethernet0/4/840G)Or verify directly:
Previous command output (if the output of a command-Loss, currentError etc has changed)
Before:
4294967.3GAfter:
40GSigned-off-by: Rustiqly rustiqly@users.noreply.github.com